160 research outputs found
Are Labels Needed for Incremental Instance Learning?
In this paper, we learn to classify visual object instances, incrementally
and via self-supervision (self-incremental). Our learner observes a single
instance at a time, which is then discarded from the dataset. Incremental
instance learning is challenging, since longer learning sessions exacerbate
forgetfulness, and labeling instances is cumbersome. We overcome these
challenges via three contributions: i. We propose VINIL, a self-incremental
learner that can learn object instances sequentially, ii. We equip VINIL with
self-supervision to by-pass the need for instance labelling, iii. We compare
VINIL to label-supervised variants on two large-scale benchmarks, and show that
VINIL significantly improves accuracy while reducing forgetfulness.Comment: Accepted at CVPRW on CLVISION (Oral
Adaptation Strategies for Automated Machine Learning on Evolving Data
Automated Machine Learning (AutoML) systems have been shown to efficiently
build good models for new datasets. However, it is often not clear how well
they can adapt when the data evolves over time. The main goal of this study is
to understand the effect of data stream challenges such as concept drift on the
performance of AutoML methods, and which adaptation strategies can be employed
to make them more robust. To that end, we propose 6 concept drift adaptation
strategies and evaluate their effectiveness on different AutoML approaches. We
do this for a variety of AutoML approaches for building machine learning
pipelines, including those that leverage Bayesian optimization, genetic
programming, and random search with automated stacking. These are evaluated
empirically on real-world and synthetic data streams with different types of
concept drift. Based on this analysis, we propose ways to develop more
sophisticated and robust AutoML techniques.Comment: 12 pages, 7 figures (14 counting subfigures), submitted to TPAMI -
AutoML Special Issu
Genetic automated machine learning assistant
GAMA is an AutoML package for end-users and AutoML researchers. It uses genetic programming to efficiently generate optimized machine learning pipelines given specific input data and resource constraints. A machine learning pipeline contains data preprocessing as well as a machine learning algorithm, with fine-tuned hyperparameter settings.
Document type: Articl
Experiment Databases: Creating a New Platform for Meta-Learning Research
Many studies in machine learning try to investigate what makes an algorithm succeed or fail on certain datasets. However, the field is still evolving relatively quickly, and new algorithms, preprocessing methods, learning tasks and evaluation procedures continue to emerge in the literature. Thus, it is impossible for a single study to cover this expanding space of learning approaches. In this paper, we propose a community-based approach for the analysis of learning algorithms, driven by sharing meta-data from previous experiments in a uniform way. We illustrate how organizing this information in a central database can create a practical public platform for any kind of exploitation of meta-knowledge, allowing effective reuse of previous experimentation and targeted analysis of the collected results
Towards Meta-learning over Data Streams
Modern society produces vast streams of data. Many stream mining algorithms have been developed to capture general trends in these streams, and make predictions for future observations, but relatively little is known about which algorithms perform particularly well on which kinds of data. Moreover, it is possible that the characteristics of the data change over time, and thus that a different algorithm should be recommended at various points in time. Figure 1 illustrates this. As such, we are dealing with the Algorithm Selection Problem [9] in a data stream setting. Based on measurable meta-features from a window of observations from a data stream, a meta-algorithm is built that predicts the best classifier for the next window. Our results show that this meta-algorithm is competitive with state-of-the art data streaming ensembles, such as OzaBag [6], OzaBoost [6] and Leveraged Bagging [3]
OpenML: networked science in machine learning
Many sciences have made significant breakthroughs by adopting online tools
that help organize, structure and mine information that is too detailed to be
printed in journals. In this paper, we introduce OpenML, a place for machine
learning researchers to share and organize data in fine detail, so that they
can work more effectively, be more visible, and collaborate with others to
tackle harder problems. We discuss how OpenML relates to other examples of
networked science and what benefits it brings for machine learning research,
individual scientists, as well as students and practitioners.Comment: 12 pages, 10 figure
- …